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Abstract 

In this paper we define an efficient implementation of Runge-Kutta meth- 
ods of Radau IIA type, which are commonly used when solving stiff ODE- 
IVPs problems. The proposed implementation relies on an alternative low- 
rank formulation of the methods, for which a splitting procedure is easily 
defined. The linear convergence analysis of this splitting procedure exhibits 
excellent properties, which are confirmed by its performance on a few nu- 
merical tests. 
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1 Introduction 

The efficient numerical solution of implicit Runge-Kutta methods has been the 
subject of many investigations in the last decades, starting from the seminal paper 
of Butcher [T71 118] (see also [3]). An s-stage R-K method applied to the initial 
value problem 

y' = fiy), t/(to) = z/o e M'", (i) 
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yields a nonlinear system of dimension sm which takes the form 



G{y) = y-e®yo-hA®I f{y) = 0, 



(2) 



where 



/ 1 \ 



y 



\ys J 



f{y) 



( fivi) \ 

\ fiVs) J 



(3) 



yi,...,ys being the internal stages. It is common to solve (Ej) by a simplified 
Newton iteration, namely, for k = 0,1, . . . , 



yik+l) ^ y(k) _^ ^(fc)^ 

where J is the Jacobian of / evaluated at some intermediate point and y 
tial approximation of the stage vector, for instance J = ^{vo) and y^^'' 



(4) 



(0) 



an ini- 

e (g) yo. 

To reduce the computational efforts associated with the solution of dl]), a suitable 
linear change of variables on the s stages of the method is often introduced with 
the goal of simplifying the structure of the system itself. This is tantamount to 
performing a similarity transformation, commonly referred to as Butcher transfor- 
mation, that puts the coefficient matrix A of the R-K method in a simpler form, 
i.e. a diagonal or triangular matrix. Let B = TAT~^ such a transformation. 
System (jl]) becomes 



(/ - h{B ® J)) (T ® /)A('=) = -(T ® I)G{y 



(5) 



with the obvious advantage that the costs associated with the LU factorizations 
decrease from O(s^m^) to O(sm^) /^oj'sll] In particular, if A has a one-point 
spectrum one only needs a single LU decomposition and the cost further reduces 
to 0{m^) flops |T5]. However, for many fully implicit methods of interest, the 
matrix A possesses complex conjugate pairs of eigenvalues which will appear as 
diagonal entries in the matrix B. In such a case, it is computationally more 
advantageous to allow B to be block-diagonal, with each 2x2 diagonal block 
corresponding to a complex conjugate pair of eigenvalues of A. Each subsystem 
of dimension 2m is then turned into an m-dimensional complex system. This is 
the standard procedure used in the codes RADAU5 [23it30j and RADAU [211 150] . 
the former a variable-step fixed-order code, and the latter a variable-order variant, 
both based upon Radau-IIA formulae (of orders 5, 9, and 13). 



^One flop is an elementary /toating-point operation. 
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Subsequent attempts to derive implicit high-order methods, for which the dis- 
crete problem to be solved can be cast in a simplified form, have been made, e.g., 
in [U |19]. This line of investigation has been further refined in later papers (see, 
e.g., [211 [2Q]). Sometimes, the formulation of the discrete problem has been suit- 
ably modified, in order to induce a corresponding "natural splitting" procedure, 
as is done, e.g., in [U [IHl HI] (see also [H E]). 

A different approach to the problem is that of considering suitable splitting 
procedures for solving the generated discrete problems [2], |22l |25l [261 EZl EHl |29] . A 
particularly interesting splitting scheme, first introduced in |26], is that induced by 
the Grout factorization of the coefficient matrix A, namely A = LU, with L lower 
triangular and U upper triangular with unit diagonal entries. After observing that, 
for many remarkable R-K methods, the lower triangular part of A is dominant, in 
[26] the authors suggest to replace the matrix A in (|4|) with the matrix, L thus 
obtaining the scheme 

(/-/iL® J)A(^') = -G(y(^')), 

Compared to this scheme only requires the sequential solution of s subsystems 
of dimension m and therefore a global cost of O(sm^) elementary operations. 
Moreover, the s LU factorizations of the matrices / — la J [la being the ith 
diagonal entry of L), and the evaluations of the components of G{y^'^^) may be 
done in parallel. This is why the corresponding methods have been named parallel 
triangularly implicit iterated R-K methods (PTIRK). 

On the other hand, if the original modified Newton process (jl]) converges in 
one iterate on linear problems, the same no longer holds true for ([6]), due to the 
approximation A ~ L. Applying the method to the linear test equation y' = Xy 
yields the following estimation for the error e*-*^-' = y^^-' — y: 

e(^+i) = M(g)e('=), M{q) = q{I ~ qLy\A - L), (7) 

with q = hX. Matrix M{q) is referred to as the amplification matrix associated 
with the method and its properties infiuence the rate of convergence of the scheme 
(El) according to a first order convergence analysis (see Section H]) . 

In this paper we wish to combine both the approaches described above and 
epitomized at formulae ([5|) and ([6|), to derive an efficient implementation of Radau 
IIA methods on sequential computers. In fact, the above discussion begs the 
following question: is it possible to perform a change of variables of the stage 
vector such that, for the new system ([5|), the matrix B admits a LU factorization 
with constant diagonal entries? In the affirmative, a single LU factorization would 
be needed to solve ([6|), with a cost of only 0{m^) flops. A first positive answer in 
this direction has been given in |2j for general R-K methods. Later on, in [22], an 
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optimal splitting of the form ([6]) has been devised for the Radau IIA method of 
order three (two stages), with lu = l22- 

In this paper, we follow a different route, which relies on a low-rank formulation 
of Radau IIA collocation methods. Low-rank R-K methods have been recently 
introduced in a series of papers in the context of numerical geometric integration 
[SI El El El E] (see also [IB] for an application of low-rank R-K methods to stochastic 
differential equations). 

Furthermore, our aim is not to destroy the overall convergence features of the 
simplified Newton method (jlj). Thus, instead of ([6|), we first recast system (jlj) as 

(/ - h{L ® J)) A^'^) = h{{A-L)<^ J) A^'^) - G'(2/('=)), (8) 

and then, we retrieve an approximation of the unknown vector A'^'^^ by means of 
the inner iteration 

(/ - h{L ® J)) = h{{A-L)® J) AW - G(2/«), (9) 

starting at Aq*^^ = 0. The inner scheme ^ could be iterated to convergence or 
stopped after a suitable number, say r, of steps. We see that dHD corresponds 
to ([9]) performed with one single inner iteration. Considering that no function 
evaluations are needed during the implementation of ([9]), we aim to perform the 
minimum number r of inner iterations that does not alter the convergence rate of 
the outer iteration 

The convergence properties of the purely linear scheme ([21) continue to be 
described by the amplification matrix M{q) defined at ([7]). In fact, its iteration 
matrix is 

h{I -h{L® J))-\{A- L)® J), 

which reduces to M(g) for the individual components corresponding to the eigen- 
values A of J. An advantage of the change of variable we propose is that a fast 
convergence rate is guaranteed at the very first steps of the process, and we will 
show that, in many practical situations, choosing v < s produces very good results 
(see Table [3|). 

The paper is organized as follows. The low-rank formulation of Gauss Radau 
IIA methods is presented in Section [21 while the splitting procedure is defined in 
Section [31 Its convergence analysis and some comparisons with similar splitting 
procedures are reported in Section [H Section [5] is devoted to some numerical tests 
with the fortran 77 code RADAU5 [231 l3Dj , modified according to the presented 
procedure. Finally, a few conclusions are reported in Section [6l along with future 
directions of investigations. 



4 



2 Augmented low-rank implementation of Radau 
IIA methods 

The discrete problem generated by the apphcation of an s-stage (s > 2) Radau 
IIA method to problem ([1]) may be cast in vector form, by using the W-transform 
[23] . as: 

y = e0yo + hVX,V-'®Ifiy), (10) 
where e, y and f{y) are defined at ([2]), while the matrices V and Xg are defined 



as 



V 



/ Po(ci) ... Ps-i(ci) \ 

\ Po(Cs) ... Ps-l{Cs) J 



( \ 



V 



is-2 -e.-i 



with {Pj} the shifted and normalized Legendre polynomials on the interval [0, 1], 



and 



2 ^4^2 - 1 ■ 



1 S 



/3s 



4s -2 



Clearly, h is the step size and the abscissae {ci, . . . , Cs] are the Gauss- Radau nodes 
in [0, 1]. In particular, = 1, so that ys is the approximation to the true solution 
at the time ti = to + h. 

We now derive an augmented low-rank Runge-Kutta method, which is equiv- 
alent to ( TTOj) . by following an approach similar to that devised in |5] to introduce 
Hamiltonian boundary value methods (HBVMs), a class of energy-preserving R-K 
methods. In more detail, we choose an auxiliary set of distinct abscissae. 



< ci < ■ ■ ■ < = 1, 
and define the following change of variables involving the internal stages yi 

y = VV-^®Iy, 

( yi\ ( Po{ci) ■ 

\ys j \ Poics) ■ 



(12) 



(13) 



with 



y 



Ps-l{c^) \ 

Ps-liCs) J 



The vectors {yi}, i = 1, . . . , s, called auxiliary stages^ are nothing but the values 
at the abscissae f|T2l) of the polynomial interpolating the internal stages {yi}. Sub- 
stituting (|T3i) into ([2]) yields the new nonlinear system in the unknown y (notice 
that VV-^e = e): 

Giy) = y-e^yo~ hTX^V-^ ®If {vV-' ®ly)=0. (14) 

Of course, after computing ^, the solution must be advanced in the standard 
manner, that is by means of the last component, of the original stage vector 
y. However notice that Cs = Cs ^ ys = Vs, so that this step of the procedure is 
costless. 

In the next section, we show that the auxiliary abscissae f[T^ can be chosen so 
that the solution of the corresponding simplified Newton iteration (see 0151) below) 
is more efficient than solving (jlj). We end this section by noticing that system fll4p 
is actually identified by a R-K method with rank deficient coefficient matrix. 

Theorem 1 The method I[TS\) -[I4\) can be cast as a Runge-Kutta method with 
2s-stages, defined by the following Butcher tableau: 



c 







c 









0' 





where c, c are the vectors with the Radau abscissae and the auxiliary abscissae 
( fi^) . respectively, and b contains the weights of the Radau quadrature. 



3 The splitting procedure 

The simplified Newton iteration (see (j4])) applied to ( IT^ reads 

(/ - hVX.,V-' ® j) A('=) = -G(y^), 

y{k+l) ^ y{k) ^ ^{k) _ 

As we can see, its structure is precisely the same as that we would obtain by 
applying the simplified Newton iteration directly to the original system (11 01) . with 
the only difference that the matrix V in f|T5l) should be replaced by V. 

As was emphasized in the introduction, to simplify the structure of systems 
such as (IT^ . van der Houwen and de Swart [SHIEZI proposed to replace the matrix 

^They are called silent stages in the HBVMs terminology, since their presence does not alter 
the complexity of the resulting nonlinear system. Similarly, the abscissae (jl2p are called silent 
abscissae. 
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{VXsV~^) in (ITOj) with the lower triangular matrix L arising from its Grout factor- 
ization. The advantage is that, in such a case, to perform the iteration, one has to 
factorize s matrices having the same size m as that of the continuous problem with 
a noticeable saving of work. They show that on parallel computers this approach 
gives very interesting speedups over more standard approaches based on the use 
of the LU factorization. This is symptomatic of the fact that LU factorizations 
generally give a relevant contribution to the overall execution time of a given code. 

Similarly, here we want to take advantage from both the Grout factorization of 
{VXgV^^) appearing in f|T5|) and the freedom of choosing the auxiliary abscissae 
{cj}, to devise an iteration scheme that only requires a single LU factorization of 
a system of dimension m which is, therefore, suitable for sequential programming. 
Differently from [26], we continue to adopt the iteration ( 1T5|) (outer iteration) and 
retrieve an approximation to via the linear inner iteration 

(l-hL® j) = h ({VX,V-^ -L)®.j) A(^') - G(^(^)), z/ = 0, 1, . . . , 

(16) 

where 

VXsV-^ = LU, (17) 

with L lower triangular and U upper triangular with unit diagonal entries. Our 
purpose is to choose the auxiliary abscissae ( TT^ so that all the diagonal entries of 
L are equal to each other, i.e.. 



(L),, = Vdet(X,), j = l,...,s. (18) 

In so doing, one has to factor only one m x m matrix, to carry out the inner 
iteration ( IT6|) . Goncerning the diagonal entry in (fT8|) . the following result can be 
proved by induction. 

Theorem 2 Let Xs be defined according to / flT]) and let 

s I f 1, if s is even, 

^ = 1 + 2 - - s = S ^ • 

2 1^ 0, otherwise, 

with [-J the floor function. Then 

det(X,) = ^ . (19) 

nt2i;(4z2 _ 1) 

Consequently, from [T^) one has: 

2^-1 

ds:={L),j = r, J = l,...,s. (20) 

(n?iii;''(4^^-i: 
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In Table [T] we list the auxiliary abscissae {ci}i=i^,,,^s and the diagonal entries dg, 
given by (120|) . for the Radau IIA methods with s = 2, . . . , 5 stages. Notice that, 
having set Cg = 1, the free parameters are s — 1, namely q, i = l,...,s — 1. We 
have formally derived the expression of the first s — 1 diagonal entries of the matrix 
1/ as a function of these unknowns, {L)jj = £j{ci, . . . ,Cs), and then we have solved 
the (s — l)-dimensional system ij{ci, . . . , c^) = dg, j = 1, . . . , s — 1, with the aid 
of the symbolic computation software Maple. From f fT9|) it is clear that the last 
diagonal element of L will be automatically equal to ds, too. 

As was observed in [20] in the context of singly implicit R-K methods, the 
implementation of a formula such as flTB]) consists of a block-forward substitution 
which requires the computation of (T J)A[,^^, with 

T = L-dJ 

(i.e., the strictly lower triangular part of matrix L), at a cost of 0{s'^m + m^s) 
operations. The 0{m?s) term, as well as the m? multiplications for computing 
{hds)J before the factorization of the matrix / — hdgJ, may be eliminated by 
multiplying both sides of (|T6l) by 

Considering that 

= d;^I - S, 

with S strictly lower triangular, system f|T6|) then takes the form 

- ^ ® ^) ^Si = l(S® + iC®J) A(^) + i?^^), (21) 

where 

C = L-\VX,V-^ - L) = U - I and R^^^ = ~{L-^ ^ I)G{y'^'''>). 

Notice that, since C is strictly upper triangular, the multiplication of J by the 
first block-component of A^^'' may be skipped. But we can go another step beyond 
and completely eliminate any 0{m?) term in the computation of the term 

{c ® J) Al,^) 

at right-hand side of (1211) . This is true at the very first step, since, by definition, 

AJ') = 0. 
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Let us set 

:= {C®J) Al'^+R^'\ 

which is part of the right-hand side of f l2T|) . Thus Wq = R^''^ and the first step of 
fl2Tl) is equivalent to the system 

[{hd.y^I -I® J] Af ^ = h'\S ® /)Af ^ + Wo. (22) 

After solving for the unknown A^i\ we set Vi equal to the right-hand side of fl22|) . 
which can be exploited to compute the term 

(J® j)AS') = {hdsr'Af^ 

at a cost of 0{ms) operations. It follows that 



(C® J) Af^ = (C®/) {I^J)A\^> ={C®I) {hds)-'A\^> -vi 



(k) 



(23) 



and thus wi = {C (g) J) Af ^ + may be computed with O(s^m) fioating point 
operations. This trick may be repeated at the subsequent steps, thus resulting in 
the following algorithm: 



Wo 

do u 



0,1,... 

solve: [{hds)-H - / 8) J] Ai% = h-\S ® I)Al%^ + w, 
h-^{S^I)Ai% + w, 



(k) 



{C®I) ihd,)-'Ai%-v,^, 



end do 



Notice that v^+i is just the right-hand side of the preceding linear system and thus 
it is freely available as soon as the system has been solved. 



4 Convergence analysis and comparisons 

In this section we briefiy analyze the splitting procedure f|T6l) . This will be done 
according to the linear analysis of convergence in [26] (see also [13]). In such a 
case, problem ([1]) becomes the celebrated test equation 

y' = Xy, y{to) = yo- (24) 

By setting, as usual, q = hX, one then obtains that the error equation associated 
with f|T6l) is given by 

e,+i = M(g)e„ M(g) := q{I - qLy'L{U - /), = 0, 1, . . . , (25) 
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Table 1: Auxiliary abscissae for the s-stage Radau method, s = 2, . . . , 5, and the 
diagonal entry is (see [20j) of the corresponding factor L. 



s = 2 


Cl 


(6 - VQ)/{Q + 2v^) 




1 


d2 


0.40824829046386301636621401245098 


s = 3 


Cl 


0.18589230221764097222357873465176 


C2 


0.50022434784008286059148415923632 


C3 


1 


ds 


0.25543647746451770219954184281099 


s = 4 


Cl 


0.12661575733255931078112184952036 


C2 


0.34154548143311325099490740728171 


C3 


0.56937072098419698874387077046544 


C4 


1 




0.18575057999133599176307088298897 


s = 5 


Cl 


0.09527975140867214336447374571157 


C2 


0.28143874673988994521203045137949 


C3 


0.38152142820340929736570124768463 


C4 


0.60680555490108389442461323421422 


C5 


1 


0^5 


0.14591154019899779261811749554182 



where we have set = — A*^'^^ that is the error vector at step u (we neglect, 
for sake of simplicity, the index k of the owter iteration) and M{q) is the iteration 
matrix induced by the splitting procedure. This latter will converge if and only if 
its spectral radius, 

p(g) :=p(M(g)), 

is less than 1. The region of convergence of the iteration is then defined as 

3 = {qeC : p{q) < 1}. 

The iteration is said to be A-convergent if C D. If, in addition, the stijf 
amplification factor, 

P°° := lim p(g), 

is null, then the iteration is said to be L-convergent. Clearly, A-convergent it- 
erations are appropriate for A-stable methods, and L-convergent iterations are 
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appropriate for L-stable methods. In our case, since 

M{q)^{U-I), q-^oo, (26) 

which is a nilpotent matrix of index s, the iteration is L-convergent if and only if 
it is A- convergent. Since the iteration is well defined for all q G (due to the 
fact that the diagonal entry of L, ds, is positive) and p(0) = 0, A-conergence, in 
turn, is equivalent to require that the maximum amplification factor, 

p* = max p(ix), 

is not larger than 1. Another useful parameter is the nonstiff amplification factor, 

p = p{L{U-I)), (27) 

that governs the convergence for small values of q since 

p(g) ~ pq, for g ^ 0. 

Clearly, the smaller p* and p, the better the convergence properties of the iteration. 
In Table [2] we list the nonstiff amplification factors and the maximum amplification 
factors for the following L-convergent iterations applied to the s-stage Radau IIA 
methods: 



(i) the iteration obtained by the original triangular splitting in 

(ii) the iteration obtained by the modified triangular splitting in [2]; 

(iii) the blended iteration obtained by the blended implementation of the methods, 
as defined in [TOj : 

(iv) the iteration defined by ffTB]) . 

We recall that the scheme (i) (first column) requires s real factorizations per it- 
eration, whereas (ii)-(iv) only need one factorization per iteration. From the pa- 
rameters listed in the table, one concludes that the proposed splitting procedure 
is the most effective among all the considered ones. 

It is worth mentioning that the above amplification factors are defined in terms 
of the eigenvalues of the involved matrices. Therefore, they are significant if a large 
number of inner iterations are performed or if the initial guess is accurate enough. 
In the computational practice, the number of inner iteration is usually small, so 
that it is also useful to check the so called averaged amplification factors over v 
iterations, defined as follows (see (1771) and 



Pv 



L{U-I)\ , p: = max Vl|M(zx)1|, p^ = J - I) 
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Table 2: Amplification factors for the triangular splitting in [26], the modified 
triangular splitting in [2], the blended iteration in [1^, and the splitting f lT6|l . for 
the s-stage Radau IIA methods. 





(i): triangular 
splitting in [26J 


(ii): triangular 
splitting in [2] 


(iii): blended 
iteration in ilOj 


(iv): triangular 
splitting (ITBl) 


s 


P P* 


P P* 


P P* 


P P* 


2 
3 
4 
5 


0.1500 0.1837 
0.1853 0.3726 
0.1728 0.5064 
0.1496 0.6103 


0.1498 0.1835 
0.1375 0.3138 
0.1236 0.4137 
0.1090 0.4949 


0.1498 0.1835 
0.1674 0.3398 
0.1535 0.4416 
0.1367 0.5123 


0.1498 0.1835 
0.1333 0.3134 
0.1174 0.3826 
0.0787 0.3963 



Table 3: Amplification factors, and averaged amplification factors after s inner 
iterations and 1 inner iteration, for the triangular splitting f|T6|) . for the s-stage 
Radau IIA methods. 



s 


P P* 


Ps P*s 


Pi Pi pT 


2 

3 
4 
5 


0.1498 0.1835 
0.1333 0.3134 
0.1174 0.3826 
0.0787 0.3963 


0.1498 0.1835 
0.1407 0.3378 
0.1316 0.4363 
0.1200 0.5841 


0.1498 0.2020 0.2020 
0.1513 0.3984 0.3440 
0.2169 0.6643 0.5172 
0.2959 1.1141 0.9945 



Clearly, 

P^ = 0, Vz/ > s, 
since matrix U — I is nilpotent of index s. Moreover, 

py — 7- p, Pu P*5 as z/ — )■ oo. 

For this reason, in Table |3] we compare the asymptotic parameters p and p* 
(columns 2 and 3) with the averaged ones over s iterations (columns 4 and 5), 
for s = 2, . . . , 5. As one can see, the iterations are still L-convergent after s it- 
erations (the norm || ■ ||oo has been considered). In the last three columns of the 
table, we list the amplification factors after just 1 inner iteration: in such a case, 
the iterations are no more L-convergent, though still A- convergent, up to s = 4. 

5 Numerical Tests 

In this section, we report a few results on three stiff problems taken from the Test 
Set for I VP Solvers [30]: 
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• Elastic Beam problem, of dimension m = 80; 

• Emep problem, of dimension m = 66; 

• Ring Modulator problem, of dimension m = 15. 

All problems have been solved by using the RADAU5 code [231. 130] and a suitable 
modification of it which implements the splitting procedure with a fixed number 
of inner iterations, namely u = 1,2, 3. 

Clearly, further improvements could be obtained by dynamically varying the 
number of inner iterations as well as by implementing a suitable strategy, well 
tuned for the new iterative procedure, to decide whether the evaluation of the 
Jacobian can be avoided. In absence of such refinements, in order to verify the ef- 
fectiveness of the proposed approach, we have forced the evaluation of the Jacobian 
after every accepted step by setting in input work(3)=-lD0. As a consequence, 
the factorization of the involved matrices is computed at each integration step. 

All the experiments have been done on a PC with an Intel Core2 Quad Q9400 
@ 2.66GHz processor under Linux by using the GNU Fortran compiler gf ortran 
with optimization flag -Of ast. 

The following input tolerances for the relative {rtol) and absolute {atol) errors 
and initial stepsizes (ho) have been used: 

• Elastic Beam problem: rtol = atol = ho = 10~^~*/^, i = 0, . . . , 16; 

• Emep problem: rtol = 10"^"*/"^, z = 0, . . . , 28, atol = 1 and Iiq = 10"''; 

• Ring Modulator problem: rtol = atol = Iiq = 10^^"*/"^, i = 0, . . . , 20. 

Figures [H [21 and [3] show the obtained results as work-precision diagrams, where 
the CPU-time in seconds is plotted versus accuracy, measured as mixed- error sig- 
nificant correct digits (mescd), defined as 

-logio max + Iz/il)}, 

j=l,...,m 

Cj being the error in the ith entry of the solution at the end of the trajectory, and 
Hi the corresponding reference value (which is known, for all problems in the Test 
Set). 

For the first two problems, the work-precision diagrams suggest that the split- 
ting version of the RADAU5 code is more efficient than the original one, even 
starting with 1 inner iteration. Moreover, in Tables |1H7| we list a few statistics 
for the Elastic Beam problem, from which one deduces that, by using 2-3 inner 
iterations, the number of steps is approximately the same as the original code: in 
other words, the convergence rate of the outer iteration is preserved. 
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■ ■▼■ Radau 5 
- ♦ - Splitting 1 inner it. 
— • — Splitting 2 inner it. 
-■- Splitting 3 inner it. 

3 3.5 4 4.5 5 5.5 

mescd 

Figure 1: Work precision diagram for the Elastic Beam problem. 

For the last problem (Ring Modulator), which has a much smaller size, the 
splitting with 2 and 3 inner iterations is less efficient than the original RADAU5 
code. Nevertheless, when using a single inner iteration the algorithm uses a larger 
number of steps (8-10% more), as is shown in Tables |H] and resulting into a much 
more accurate solution. In our understanding, this behaviour may be explained 
by considering that computing the vector field f{t, y) of this problem is extremely 
cheap and hence accuracy is more conveniently obtained by acting on the number 
of function evaluations rather than on the number of inner iterations. 

6 Conclusions 

In this paper we have defined a splitting procedure for Radau IIA methods, derived 
by an augmented low-rank formulation of the methods. In such formulation, a 
set of auxiliary abscissae are determined such that the Crout factorization of a 
corresponding matrix associated with the method has constant diagonal entries. 
In such a case, the complexity of the iteration is optimal. Moreover, the presented 
iteration compares favorably with all previously defined iterative procedures for 
the efficient implementation of Radau IIA methods. The presented technique can 
be straightforwardly extended to other classes of implicit Runge-Kutta methods 
(e.g., collocation methods) and this will be the subject of future investigations. 
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Figure 2: Work precision diagram for the Emep problem. 
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igure 3: Work precision diagram for the Ring Modulator problem. 
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Table 4: Statistics for the Elastic Beam problem, RADAU5. 



rtol 


mescd 


steps 


accept 


feval 


jeval 


LU 


CPU-time 


l.OOE-04 


3.36 


55 


49 


380 


49 


55 


6.24E-02 


l.OOE-05 


3.67 


112 


95 


764 


95 


112 


1.23E-01 


l.OOE-06 


3.78 


162 


146 


1103 


146 


162 


1.81E-01 


l.OOE-07 


4.18 


275 


251 


1853 


251 


275 


3.06E-01 


l.OOE-08 


4.69 


507 


459 


3417 


459 


507 


5.62E-01 



Table 5: Statistics for the Elastic Beam problem, RADAU5, split 1. 



rtol 


mescd 


steps 


accept 


feval 


jeval 


LU 


CPU-time 


l.OOE-04 


3.20 


74 


66 


870 


66 


74 


5.12E-02 


l.OOE-05 


3.76 


117 


105 


1443 


105 


117 


8.40E-02 


l.OOE-06 


3.95 


193 


177 


2769 


177 


193 


1.44E-01 


l.OOE-07 


4.35 


374 


330 


5925 


330 


374 


2.80E-01 


l.OOE-08 


5.02 


801 


655 


12814 


655 


801 


5.84E-01 



Table 6: Statistics for the Elastic Beam problem, RADAU5, split 2. 



rtol 


mescd 


steps 


accept 


feval 


jeval 


LU 


CPU-time 


l.OOE-04 


3.57 


66 


56 


548 


56 


66 


4.68E-02 


l.OOE-05 


3.71 


112 


96 


879 


96 


112 


7.76E-02 


l.OOE-06 


3.76 


152 


144 


1290 


144 


152 


1.12E-01 


l.OOE-07 


4.20 


284 


260 


2603 


260 


284 


2.10E-01 


l.OOE-08 


4.72 


517 


481 


5044 


481 


517 


3.89E-01 



Table 7: Statistics for the Elastic Beam problem, RADAU5, split 3. 



rtol 


mescd 


steps 


accept 


feval 


jeval 


LU 


CPU-time 


l.OOE-04 


3.53 


64 


54 


454 


54 


64 


4.76E-02 


l.OOE-05 


3.67 


115 


96 


810 


96 


115 


8.32E-02 


l.OOE-06 


3.74 


154 


141 


1104 


141 


154 


1.16E-01 


l.OOE-07 


4.17 


273 


249 


1959 


249 


273 


2.04E-01 


l.OOE-08 


4.68 


502 


456 


3654 


456 


502 


3.74E-01 
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Table 8: Statistics for the Ring Modulator problem, RADAU5. 



rtol 


mescd 


steps 


accept 


feval 


jeval 


LU 


CPU-time 


1 OOE-07 


4.42 


98754 


89346 


510295 


89346 


98754 


1.37E+00 




5.20 


137823 


128316 


727506 


128316 


137823 


1.92E+00 


1 nnF,-nQ 


5.96 


194463 


185008 


1046747 


185008 


194463 


2.74E+00 


1 nnF,-in 


6.75 


277830 


268414 


1525756 


268414 


277830 


3.94E+00 


1 DDF,-! 1 

X . W W XI/ X X 


7.52 


399846 


390508 


2234881 


390508 


399846 


5.71E+00 


1 DDF- 19 

X . W W i J X 


8.30 


580535 


571309 


3365783 


571309 


580535 


8.42E+00 


Table 9 


: Statistics for the Ring Modulator problem, RADAU5, split 1. 


rtol 


mescd 


steps 


accept 


feval 


jeval 


LU 


CPU-time 


l.OOE-07 


4.97 


110376 


95269 


958749 


95269 


110376 


1.54E+00 


l.OOE-08 


5.91 


152526 


136231 


1328822 


136231 


152526 


2.13E+00 


l.OOE-09 


6.93 


212686 


195982 


1855438 


195982 


212686 


2.98E+00 


l.OOE-10 


8.36 


301719 


283921 


2635810 


283921 


301719 


4.23E+00 


l.OOE-11 


8.75 


432000 


412643 


3785978 


412643 


432000 


6.07E+00 


l.OOE-12 


9.05 


624708 


602385 


5524392 


602385 


624708 


8.84E+00 
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